Text analysis
Awesome summary
This section regarding text analysis is divided into two parts: namely wordclouds and sentiment analysis. Both the extracted wiki pages and the character dialogoues will be used and it will be investigated how wordclouds and sentiment analysis will differ based on the two different data sets.
Wordclouds
First, we will take a look at word clouds. As mentioned before, both the extracted wiki pages and the full series dialogoue will be investigated. We will start by generating wordclouds for characters of interest. Here, we have selected the characters: Jon Snow, Arya Stark, Bronn, Brienne of Tarth and Jaime Lannister. The first step in generating the wordclouds is to compute the term frequeny-inverse document frequency (TF-IDF) for our respective text corpus, i.e. the wiki pages and episode dialogoues. For further explanation of the TF-IDF and it's computation we refer to the Explainer Notebook. It should be mentioned that we have removed all characters' names from the text corpus as these would not be very decriptive of the character in a wordcloud or during sentiment analysis.
Now, let's take a look at the generated wordclouds for the selected characters.
Wordclouds based on character wiki page & dialogoue
When comparing the generated wordclouds for the respective data sets it should be noted, that the same words are, for the most part, not present for the respective characters. This is expected as one would imagine that the text from the characters wikipedia pages are more descriptive of the character and their place in the story whereas the wordcloud from the dialogoue is exactly that; their most descrriptive words according to TF-IDC used throughout the series. This would be interesting to compare with sentiment analysis which is the second part of this page.
Wordclouds based on selected houses
Next, we will generate wordclouds based on the characters allegiance. This will be done by pooling the dialogoue text of characters belonging to the same allegiance together and, again, compute the respective TF-IDF score in order to generate the wordclouds. For this, we have selected the houses: Stark, Lannister, Targaryen, Greyjoy and the independent group The Night's Watch. It would be interesting to see, if the houses mottos would appear in these word clouds. The respective house mottos are:
House Stark: Winter is coming
House Lannister: Hear Me Roar!
House Targaryen: Fire and Blood
House Greyjoy: We Do Not Sow
As the Night's Watch is not a House but rather a brotherhood sworn to protect The Wall, they do not have a motto.
When looking at the wordclouds above and the respective house mottos, only the Lannisters' Hear (big, middle) are present. All the wordclouds are, however, very descriptive of the respective houses. For instance for the Night's Watch, a military order sworn to protect The Wall, words like protect, wildling and swear are present. The same can be said for House Targaryan, where the main Targaryan character, Daenerys, is married to a dothraki warlord and later in the show, is a leader of dothraki people herself.
Wordclouds based on seasons
We will now generate wordclouds based on the wiki pages' season sections. It would be interesting to see how these wordclouds change as the story unfolds. It would also be intersting to investigate whether the overall theme of the series changes during the series course and if this can be seen in the wordclouds.
Taking example in the wordclouds generated for season 1 & 8, the emphasized words seem very descriptive of their respective seasons. Starting with season 1:
- execute, behead : One of the main acts of season 1, is the execution of Lord Eddard Stark, the head of House Stark. He is, by the unexpected command of the king Joffrey Baratheon, beheaded in the middle of King's Landing.
- Khal, bloodrider : Another of the main story arcs, is the story of Daenarys Targaryan which takes place in a foreign land. In season 1, Daenarys is married of to a powerful Khal, Khal Drogo, in a trade by Daenarys brother. A Khal has three bloodriders who are to live and die by the life of their Khal. The words Khal and bloodrider being so prominent makes sense, as they are key roles in Daenarys' story arc.
Comparing the wordclouds of season 1 and season 8, it appears season 8 has different key words. For season 8:
-
celebrate : The word celebrate stands in stark constrast to the prominent words suffer from season 1. This could be due to season 8 being the series final season and it's characters are therefore celebrating the story ending on a happy note (for some of the characters
) - reunite : The story culminates in the final season, many characters who have been seperated throughout the show are finally reunited in the final season of the show, hence emphasis on the word reunite makes sense.
It should also be noted that the word destroy is present in the majority of the wordclouds, only being omitted in the wordclouds for season 1 and 3.
Sentiment of characters
In this second part of text analysis, we will do a sentiment analysis of the characters, again, based on both their wiki-pages and their dialogoue in the series. As we saw in the wordclouds of the selected characters, there was quite a difference in the wordclouds based on the respective wiki-pages and character dialogoue. It would be interesting to look at, if this also results in a different sentiment level of the character. Additionally, we will also do a sentiment analysis of the different seasons of the series. Perhaps it can be determined if any of the seasons were significantly different on a sentiment based level.
For the sentiment analysis, we will apply both the dictionary based method of LabMT and the rule- and dictionary-based method of VADER. For further explanation of how these sentiment scores are computed and the difference between the two methods, we again refer to the Explainer Notebook. It should be noted that the score of the two methods differ, as the LabMT score sentiment on a scale from [1:9], while VADER scores on the range [-1:1]. For LabMT, a score of 5 is considered neutral while a score within the range [-0.05:0.05] is considered neutral for VADER.
Sentiment analysis on character wiki pages
This subsection is going to investigate the sentiment of each character based on their character wiki page. We are further going to compare this with the sentiment of characters based on their dialogoue.
From the figure below it can be seen that the two methods, again, do not completely agree on the result but both methods yield approximately the same result. Again the figure displays the 10 happiest and sadest characters based on LabMT and VADER.
At a first glance, it is noticed that the VADER score are lower for the happiest characters than in the previous part whereas the sadest achieve almost the same score. The LabMT results are quite similar in sentiment levels. Again many characters are found in both results such as Septa, Moro, Orell and Polliver.
When comparing with the result based on the character dialogoue not many characters are found in all four results. This could indicate that the wiki-pages and dialogoue does not contain the same information, or that the chosen words on the wiki-pages do not necessarily imply information about the characters sentiment.
It would be expected that the dialogoue contains greater variety of words that can explain the character mood, whereas the wiki-pages would contain words that describe the character and his/hers actions. We also notice that the variation in VADER sentiment scores are far greater when using the dialogoue compared with the wiki-page which could be an indication that our hypothesis are true.